Rfun logo

This code can be found at https://github.com/libjohn/workshop_rfun_flipped/ggplot2_quick.Rmd

Load library packages

I only need ggplot2 but I like to load tidyverse because it includes 8 complimentary packages, including ggplot2.

# library(ggplot2)
library(tidyverse)

Get more information from:

ggplot2 template code

The ggplot2 template is used to identify the dataframe, identify the x and y axis, and define visualized layers

ggplot(data = ---, mapping = aes(x = ---, y = ---)) + geom_----()

Note: ---- is meant to imply text (function names, dataframe names, variable names) you supply.

It is helpful to see the argument mapping, above. In practice, rather than typing the formal arguments, code is typically shorthanded to this:

dataframe %>% ggplot(aes(xvar, yvar)) + geom_----()

Goal

Visualize a scatter plot showing the relationship of mass to height for Star Wars characters in the dplyr::starwars dataframe, excluding the heaviest character. Indicate a linear regression line.

Import data

dplyr has an onboard dataset, starwars

data(starwars)
starwars

Steps to Visualization

Draw the base layer

This feels like, and looks like, you drew an empty box.

starwars %>% 
  ggplot() 

But wait, there’s more….

Map the aesthetics to variables in the dataframe

Still doesn’t look like much. You will initialize the plot scales and labels based on the values of the variables in the dataframe.

starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(height, mass))

In the above, I subset the data, removing any Star Wars characters weighing more than 500 Kg – dplyr::filter(). Then I initialized the base layer with the height as the x axis and mass as the y axis. ggplot drew the scales for me.

Visualize a layer

Since I have two numeric variables, height and mass, I’ll start with a scatter plot. Scatter plots are generated by the geom_point() function.

starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(height, mass)) +
  geom_point() 

Global v local arguments

  • Mapping aesthetics in the ggplot argument maps aesthetics as global arguments (above)
  • Arguments can also be set locally in the local layer function

aes() arguments mapped locally in geom_point()

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  geom_point(aes(height, mass)) 

Mapping v Setting

Many arguments can be mapped inside the aesthetic, aes(), to leverage variable values, OR set a visualized property outside the aes() function, but inside the geom_ function.

Aesthetic arguments include:

  • color
  • fill
  • size
  • linetype
  • opacity
  • shape
  • and more see documentation for each geom_

Mapping: color is mapped inside aes() function

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  # geom_point(mapping = aes(x = height, y = mass, color = gender))
  geom_point(aes(height, mass, color = gender))

Notice the legend was drawn automatically, above, by mapping an aesthetic

Setting: color set outside the aes() function

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  geom_point(aes(height, mass), color = "goldenrod")

Common geom_ functions

Type Geom
Bar graph: geom_bar() geom_col()
Histogram: geom_hist()
Scatter plot: geom_point() geom_jitter()
Line graph: geom_line()
Box plot: geom_boxplot()
Density: geom_density() geom_violin()
Heat map: geom_heatmap()
Mapping: geom_sf()
Regression line: geom_smooth()

A list of available geom_ functions, or layers, can be found in the help or on the website: https://ggplot2.tidyverse.org/reference/index.html#section-geoms

Boxplot

starwars %>% 
  mutate(species = fct_lump_min(species, 2)) %>% 
  ggplot(aes(species, height)) +
  geom_boxplot() 

Line graph

babynames::babynames %>% 
  filter(name == "Watts") %>% 
  ggplot(aes(year, n)) +
  # geom_point() +
  geom_line()

Overplotting

There are two simple approaches to visualizing overplotted data.

  • Adjust opacity: One approach is to modify the alpha argument to affect the opacity of the points. In this way, overplotted data will appear as darker points on the plot
starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  geom_point(aes(height, mass), alpha = .3)

  • Jitter the data with geom_jitter()

geom_jitter will not change the values of the data but it will offset data points, making it easier to perceive the overplotting.

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  geom_jitter(aes(height, mass))

Multiple layers

Each layer can support local arguments and draw from the global settings. Below we use the geom_line() function, followed by the geom_point() function.

babynames %>%
  ggplot(aes(year, prop)) +
  geom_line(aes(color = sex)) +
  geom_point(alpha = 0.4, shape = "cross")

But there is more to that graph, here’s the full code for the above graph.

library(babynames)
library(ggplot)

babynames %>% 
  filter(name == "John" & sex == "M" | 
           name == "Elizabeth" & sex == "F") %>% 
  ggplot(aes(year, prop)) +
  geom_line(aes(color = sex)) +
  geom_point(alpha = 0.4, shape = "cross") +
  geom_text(data = . %>% filter(year == 1965), aes(label = name),
            nudge_y = .009) +
  labs(title = "Name Popularity") + 
  theme(legend.position = "none")

Goal

Recall the goal mentioned in the beginning. We want a scatter plot and a regression line. This can be accomplished by adding a layer in the form of another geom_ function: geom_smooth()

starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(height, mass)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE)

Arrange order

Categorical values are most easily ordered with the forcats library. Part of the tidyverse, forcats will convert string data into factors, i.e. categorical data. This enables ordering.

msleep %>% 
  ggplot(aes(vore)) +
  geom_bar()

forcats::fct_infreq()

Change the order of the bars by the frequency of observations.

msleep %>% 
  ggplot(aes(fct_infreq(vore))) +
  geom_bar() 

Notice below, we use the fill = argument to set the color of the bar. In the scatter plot, above, we used the color = argument. For many geoms you can use both color and fill. How do these arguments differ? Where can you look to find out more about fill and color?

starwars %>% 
  ggplot(aes(fct_rev(fct_infreq(eye_color)))) +
  geom_bar(fill = "grey70") +
  geom_bar(data = starwars %>% filter(eye_color == "orange"), fill = "darkorange") +
  coord_flip()

Facet wrap

Faceting is great way to make subplots of the same dataframe. See both facet_wrap() and facet_grid()

mpg %>% 
  ggplot(aes(displ, hwy)) +
  geom_point() +
  facet_wrap(~ class)

Scales

I’ll briefly introduce the use of scales to affect. In this case, scales are used to affect the color of the plot. Read more about scales.

Viridis scales apply color palettes to continuous, discrete, or binned data

msleep %>% 
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_viridis_d(na.value = "grey80")

The color brewer palette is similar but has a wider array of palettes to choose from.

msleep %>% 
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_brewer(type = "qual", na.value = "grey80") 

To find available colors: Google search “R color names”, or specific to ColorBrewer….

#display.brewer.pal(7,"Dark2")
RColorBrewer::display.brewer.all()

Sometimes a manual scale is preferred. I like to google-search: “R color names” for helpful documentation.

mycolors <- c("firebrick", "forestgreen", "navy", "darkorange", 
               "goldenrod", "sienna")

msleep %>% 
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_manual(values = mycolors, na.value = "grey80") 

Scales are used to manipulate the visual properties of the data. Beyond using scales to modify colors, another example is logarithmic scales to account for data skew. In this way you can clarify the data pattern. For example, using the ChickWeight dataset, we visualize the weights of the chicks over time.

data("ChickWeight")

ChickWeight %>% 
  ggplot(aes(Time, weight, color = Diet)) +
  geom_line(aes(group = Chick))

Using scale_y_log10 we can alter the scale to highlight a more understandable data pattern

chicken_plot <- ChickWeight %>% 
  ggplot(aes(Time, weight, color = Diet)) +
  geom_line(aes(group = Chick)) +
  scale_y_log10()
chicken_plot

Labels

The labs() function is specialized scales function, used to apply labels. For example, use the labs() function to add a title, subtitle, legend title, modify axis labels, and set a caption. See more on scales.

plot_sleep <- msleep %>% 
  mutate(vore = case_when(
    vore == "herbi" ~ "Herbivore",
    vore == "omni"  ~ "Omnivore",
    vore == "carni" ~ "Carnivore",
    vore == "insecti" ~ "Insectivore"
  ))  %>%
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_brewer(type = "qual", na.value = "grey80") +
  labs(title = "Animal sleep times", 
       subtitle = "A practice dataset",
       fill = "Conservation\nType",
       x = "",
       y = "Sleep time in hours",
       caption = "Source: ggplot::msleep")

plot_sleep

Themes

Themes are used to manipulate the stylistic characteristics of the non-data components of your plot, such as font faces, text sizes, and grid lines. ProTip: quickly manipulate a single plot with preset themes such as theme_dark, or use a specialized theme extension such as theme_ipsum from the hrbrthemes package.

See more on themes

Example themes

ggplot2 themes

Image source: from R for Data Science by Grolemund & Wickham

theme_dark()

plot_sleep +
  theme_dark()

theme_classic

plot_sleep +
  theme_classic()

hbrthemes

https://cinc.rud.is/web/packages/hrbrthemes/

plot_sleep +
  hrbrthemes::theme_ipsum(grid = "Y") +
  hrbrthemes::scale_fill_ipsum(na.value = "grey80",
                               labels = c("Critical", "Domesticated", 
                                          "Endangered", "Least Concern", 
                                          "Threatened", "Vulnerable")) +
  theme(plot.title.position = "plot")

Combine plots

The patchwork package makes it “ridiculously simple to combine separate ggplot objects into the same graphic.” See more about patchwork

# install.packages("devtools")
# devtools::install_github("thomasp85/patchwork")
# https://patchwork.data-imaginist.com/
library(patchwork)

(plot_sleep / chicken_plot)

Interactive plots

Use the ggplotly function to transform your static plot into an interactive plot that can be used in dashboards and web presentations.

See more at the Plotly ggplot2 Library page, and the Interactive web-based data visualization with R, plotly, and shiny book.

library(plotly)
ggplotly(plot_sleep)

Annimate plots

Use the gganimate package to bring your plot to life through the wonders of animation. Learn more at the resource page for gganimate

For Example:

gganmimate example

Image source: https://gganimate.com/index.html#yet-another-example

Reinforce your learning

On your own…

Interactive Exercises from RStudio Primers – Visualization

Angela Zoss code exercises